Bayesian Analysis of Pensacola Beach Loggerhead Sea Turtle Nesting Data Utilizing 10:1 Pseudo-Absence to Presence Ratio

Authors: Audrey Moore, MS & Laura Sikes, MPH
Instructor: Dr. Samantha Seals
Date: April 23, 2025

Bayesian vs. Frequentist Approach

Bayesian

  • Begins with existing belief (Prior) about probability distr. of hypothesized outcome.
  • Introduces new data with a Likelihood distribution for hypothesized outcome.
  • Bayes Theorem joins Prior and Likelihood resulting in Posterior, an updated probability distr. for hypothesized outcome.
  • Strength of Prior and sample size determine influence on Posterior.

Frequentist

  • Analyzes observed data.
  • Results in a calculated best estimate of a parameter.

Linear Regression (Normal)

Frequentist Bayesian
Parameters treated as fixed constants Parameters treated as random variables
Does not incorporate prior information Incorporates prior information or beliefs
Confidence intervals for uncertainty Credible intervals for uncertainty
Hypothesis testing using p-values Hypothesis testing by interpreting posterior probabilities
Results express frequencies of estimates from multiple iterations Results express probabilities about parameter values

Logistic Regression

  • Bayesian logistic regression, similar to its frequentist counterpart, is used to model a binary categorical response variable (Y) given a set of predictor values (X). The resulting model estimates the log-odds of Y, which we can rewrite in terms of odds and probability.

  • The Bayesian framework takes into account prior beliefs about the regression parameters, \beta_0 and \beta_1. Since these parameters can be any number on the real line, we are able to use the Normal distribution for both priors.

  • Bayesian logistic regression involves updating these priors with the likelihood of the observed data so that we can make inferences about the relationship between X and Y.

Unadjusted Analyses

Predictor Model Unadjusted 80% CI (Lower) Unadjusted 80% CI (Upper)
Beach Slope \text{log}\left( \frac{\pi}{1-\pi}\right) = -2.246 + 0.011BS -0.161 0.170
Dune Height \text{log}\left( \frac{\pi}{1-\pi}\right) = -1.364 - 0.234DH -0.536 0.021
Foreshore Slope \text{log}\left( \frac{\pi}{1-\pi}\right) = -2.952 + 0.112FS 0.017 0.205
Nest Distance \text{log}\left( \frac{\pi}{1-\pi}\right) = -0.018 + 0.016ND -0.040 0.002
Nest Elevation \text{log}\left( \frac{\pi}{1-\pi}\right) = -0.483 + 0.306NE -0.891 -0.10

Adjusted Analyses

\text{log}\left( \frac{\pi}{1-\pi}\right) = -0.963 - 0.074\text{BS} - 0.232\text{DH} + 0.178\text{FS} + 0.012\text{ND} - 0.980\text{NE}

Predictor Adjusted 80% CI (Lower) Adjusted 80% CI (Upper)
Beach Slope -0.314 0.144
Dune Height -0.555 0.038
Foreshore Slope 0.073 0.287
Nest Distance -0.039 0.057
Nest Elevation -1.822 -0.100

Diagnostics

First Alternative Model Considered

\text{log}\left( \frac{\pi}{1-\pi}\right) = -1.371 - 0.053BS - 0.208DH + 0.173FS + 0.042ND - 0.820NE - 0.015ND:NE

Predictor Adjusted 80% CI (Lower) Adjusted 80% CI (Upper)
Beach Slope -0.298 0.184
Dune Height -0.528 0.060
Foreshore Slope 0.067 0.281
Nest Distance -0.036 0.128
Nest Elevation -1.741 0.109
Nest Distance \times Nest Elevation -0.048 0.013

Second Alternative Model Considered

\text{log}(\frac{\pi}{1-\pi}) = -3.326 + 0.354FS + 0.075NE - 0.126FS:NE

Predictors 80% CI (Lower) 80% CI (Upper)

Foreshore Slope

Nest Elevation

Foreshore Slope x Nest Elevation

0.129

-0.854

-0.268

0.607

1.045

0.005

Conclusions

Unadjusted Models:
- The 80% CI for foreshore slope is above 0, suggesting the chance of nesting increases as the foreshore slope increases.
- The 80% CI for nest elevation is below 0, suggesting the chance of nesting decreases as the nest elevation increases.

Adjusted Model:
- After adjusting for all predictors, we see similar results:
- Higher foreshore slope is associated with a higher probability of nesting.
- Higher elevation is associated with a lower probability of nesting.

APPENDIX A

CODE

---
execute:
  echo: false
  warning: false
  message: false
  error: false
format: 
  revealjs:
    theme: serif
    embed-resources: true
    slide-number: true
    width: 1200
    height: 900
    df-print: paged
    html-math-method: katex
    self-contained: true
editor: source
pdf-separate-fragments: true
fig-align: center
css: |
  /* Apply font size to all tables within the document */
  .reveal table {
    font-size: 12px;  /* Adjust the font size */
  }

  /* Ensure that table headers and cells also use the new font size */
  .reveal table th, .reveal table td {
    font-size: 12px;  /* Apply the same font size to headers and data cells */
    padding: 5px;     /* Optional: Adjust padding for readability */
  }
---


#install.packages("gsheet")
library(gsheet)
library(bayesrules)
library(rstanarm)
library(bayesplot)
library(tidyverse)
library(broom.mixed)
library(tidybayes)


## Bayesian Analysis of Pensacola Beach Loggerhead Sea Turtle Nesting Data Utilizing 10:1 Pseudo-Absence to Presence Ratio

Authors: Audrey Moore, MS & Laura Sikes, MPH<br> Instructor: Dr. Samantha Seals <br> Date: April 23, 2025 <br>

## Bayesian vs. Frequentist Approach

**Bayesian**

-   Begins with existing belief (Prior) about probability distr. of hypothesized outcome.
-   Introduces new data with a Likelihood distribution for hypothesized outcome.
-   Bayes Theorem joins Prior and Likelihood resulting in Posterior, an updated probability distr. for hypothesized outcome.
-   Strength of Prior and sample size determine influence on Posterior.

**Frequentist**

-   Analyzes observed data.
-   Results in a calculated best estimate of a parameter.

## Linear Regression (Normal)

| Frequentist                                                       | Bayesian                                                   |
|--------------------------------------|----------------------------------|
| Parameters treated as fixed constants                             | Parameters treated as random variables                     |
| Does not incorporate prior information                            | Incorporates prior information or beliefs                  |
| Confidence intervals for uncertainty                              | Credible intervals for uncertainty                         |
| Hypothesis testing using *p*-values                               | Hypothesis testing by interpreting posterior probabilities |
| Results express frequencies of estimates from multiple iterations | Results express probabilities about parameter values       |

## Logistic Regression

-   Bayesian logistic regression, similar to its frequentist counterpart, is used to model a binary categorical response variable (Y) given a set of predictor values (X). The resulting model estimates the log-odds of Y, which we can rewrite in terms of odds and probability.

-   The Bayesian framework takes into account prior beliefs about the regression parameters, $\beta_0$ and $\beta_1$. Since these parameters can be any number on the real line, we are able to use the Normal distribution for both priors.

-   Bayesian logistic regression involves updating these priors with the likelihood of the observed data so that we can make inferences about the relationship between X and Y.

## Unadjusted Analyses



data10to1 <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1ARgHYUwclO5weZf5lE1f79V8otZdc4CLSpdCSmwP8Ts/edit?gid=1512758558#gid=1512758558") 

test_wo_drop <- data10to1 %>%
  na.omit()

data10to1 <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1ARgHYUwclO5weZf5lE1f79V8otZdc4CLSpdCSmwP8Ts/edit?gid=1512758558#gid=1512758558") %>%
  select(-do_not_use_FS, -do_not_use_BS)

test_w_drop <- data10to1 %>%
  na.omit()


# Simulate the prior distribution
turtle_model_prior1 <- stan_glm(nested ~ beach_slope,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model1 <- update(turtle_model_prior1, prior_PD = FALSE)


# Simulate the prior distribution
turtle_model_prior2 <- stan_glm(nested ~ dune_ht,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model2 <- update(turtle_model_prior2, prior_PD = FALSE)




# Simulate the prior distribution
turtle_model_prior3 <- stan_glm(nested ~ foreshore_slope,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model3 <- update(turtle_model_prior3, prior_PD = FALSE)


# Simulate the prior distribution
turtle_model_prior4 <- stan_glm(nested ~ nest_dist,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model4 <- update(turtle_model_prior4, prior_PD = FALSE)


# Simulate the prior distribution
turtle_model_prior5 <- stan_glm(nested ~ nest_elev,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model5 <- update(turtle_model_prior5, prior_PD = FALSE)


# Simulate the prior distribution
turtle_model_prior6 <- stan_glm(nested ~ beach_slope + dune_ht + foreshore_slope + nest_dist + nest_elev,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model6 <- update(turtle_model_prior6, prior_PD = FALSE)



posterior_interval(turtle_model1, prob = 0.80)

posterior_interval(turtle_model2, prob = 0.80)

posterior_interval(turtle_model3, prob = 0.80)

posterior_interval(turtle_model4, prob = 0.80)

posterior_interval(turtle_model5, prob = 0.80)

posterior_interval(turtle_model6, prob = 0.80)





tidy(turtle_model1)
tidy(turtle_model2)
tidy(turtle_model3)
tidy(turtle_model4)
tidy(turtle_model5)
tidy(turtle_model6)


| Predictor   | Model                                                                 | 80% CI<br>(Lower) | 80% CI<br>(Upper) |
|:------------|:----------------------------------------------------------------------|-------------------|-------------------|
| <small>B. Slope</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -2.246 + 0.011\text{BS}$</small> | <small>-0.160</small>       | <small>-0.169</small>        |
| <small>D. Height</small>   | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -1.364 - 0.234\text{DH}$</small> | <small>-0.536</small>       | <small>0.021</small>        |
| <small>F. Slope</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -2.952 + 0.112\text{FS}$</small> | <small>0.017</small>        | <small>0.205</small>        |
| <small>N. Dist.</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -0.018 + 0.016\text{ND}$</small> | <small>-0.040</small>       | <small>0.002</small>        |
| <small>N. Elev.</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -0.483 + 0.306\text{NE}$</small> | <small>-0.891</small>       | <small>-0.102</small>       |

## Adjusted Analyses

<small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -0.963 - 0.074\text{BS} - 0.232\text{DH} + 0.178\text{FS} + 0.012\text{ND} - 0.980\text{NE}$ </small>

| Predictor       | Adjusted 80% CI (Lower) | Adjusted 80% CI (Upper) |
|-----------------|------------------------:|------------------------:|
| Beach Slope     |                  -0.314 |                   0.144 |
| Dune Height     |                  -0.555 |                   0.038 |
| Foreshore Slope |                   0.073 |                   0.287 |
| Nest Distance   |                  -0.039 |                   0.057 |
| Nest Elevation  |                  -1.822 |                  -0.100 |

## Diagnostics




neff_ratio(turtle_model1) #rhat and neff ratio 
rhat(turtle_model1)

neff_ratio(turtle_model2)
rhat(turtle_model2)

neff_ratio(turtle_model3)
rhat(turtle_model3)

neff_ratio(turtle_model4)
rhat(turtle_model4)

neff_ratio(turtle_model5)
rhat(turtle_model5)

neff_ratio(turtle_model6)
rhat(turtle_model6)


mcmc_trace(turtle_model1, size = 0.1)
mcmc_trace(turtle_model2, size = 0.1)
mcmc_trace(turtle_model3, size = 0.1)
mcmc_trace(turtle_model4, size = 0.1)
mcmc_trace(turtle_model5, size = 0.1)
mcmc_trace(turtle_model6, size = 0.1)




mcmc_dens_overlay(turtle_model1)
mcmc_dens_overlay(turtle_model2)
mcmc_dens_overlay(turtle_model3)
mcmc_dens_overlay(turtle_model4)
mcmc_dens_overlay(turtle_model5)
mcmc_dens_overlay(turtle_model6)



## First Alternative Model Considered


# Simulate the prior distribution
turtle_model_prior7 <- stan_glm(nested ~ beach_slope + dune_ht + foreshore_slope + nest_dist + nest_elev + nest_dist*nest_elev,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model7 <- update(turtle_model_prior7, prior_PD = FALSE)

tidy(turtle_model7)

posterior_interval(turtle_model7, prob = 0.80)


<small>$\text{logit}(\pi_i) = -1.371 - 0.053\text{BS} - 0.208\text{DH} + 0.173\text{FS} + 0.042\text{ND} - 0.820\text{NE} -0.015\text{ND}\times\text{NE}$</small>

| Predictor                             | Adjusted 80% CI (Lower) | Adjusted 80% CI (Upper) |
|-------------------------------|-------------------:|-------------------:|
| Beach Slope                           |                  -0.298 |                   0.184 |
| Dune Height                           |                  -0.528 |                   0.060 |
| Foreshore Slope                       |                   0.067 |                   0.281 |
| Nest Distance                         |                  -0.036 |                   0.128 |
| Nest Elevation                        |                  -1.741 |                   0.109 |
| Nest Distance $\times$ Nest Elevation |                  -0.048 |                   0.013 |

## Second Alternative Model Considered



# Simulate the prior distribution
turtle_model_prior8 <- stan_glm(nested ~ foreshore_slope + nest_elev + foreshore_slope:nest_elev,
                             data = test_w_drop, family = binomial,
                             prior_intercept = normal(0, 2.5),
                             prior = normal(0, 2.5),
                             chains = 4, iter = 5000*2, seed = 120189,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
turtle_model8 <- update(turtle_model_prior8, prior_PD = FALSE)


posterior_interval(turtle_model8, prob = 0.80)



| Predictor   | Model                                                                 | 80% CI<br>(Lower) | 80% CI<br>(Upper) |
|:------------|:----------------------------------------------------------------------|------------------:|------------------:|
| <small>B. Slope</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -2.246 + 0.011\text{BS}$</small> | <small>-0.160</small>       | <small>-0.169</small>        |
| <small>D. Height</small>   | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -1.364 - 0.234\text{DH}$</small> | <small>-0.536</small>       | <small>0.021</small>        |
| <small>F. Slope</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -2.952 + 0.112\text{FS}$</small> | <small>0.017</small>        | <small>0.205</small>        |
| <small>N. Dist.</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -0.018 + 0.016\text{ND}$</small> | <small>-0.040</small>       | <small>0.002</small>        |
| <small>N. Elev.</small>    | <small>$\text{log}\left( \frac{\pi}{1-\pi}\right) = -0.483 + 0.306\text{NE}$</small> | <small>-0.891</small>       | <small>-0.102</small>       |

## Conclusions

Unadjusted Models:

-   The 80% CI for foreshore slope is above 0, suggesting the chance of nesting increases as the foreshore slope increases.

-   The 80% CI for nest elevation is below 0, suggesting the chance of nesting decreases as the nest elevation increases.

Adjusted Model:

-   After adjusting for all predictors, we see similar results:

    -   Higher foreshore slope is associated with a higher probability of nesting

    -   Higher elevation is associated with a lower probability of nesting

## Code



</style>